Solved: Re: Spark

您所在的位置:网站首页 mkdirs failed to create file Solved: Re: Spark

Solved: Re: Spark

2024-07-11 02:34| 来源: 网络整理| 查看: 265

Hi,

 

I have an issue with Spark, the job failed with this error message :

 

scala> someDF.write.mode(SaveMode.Append).parquet("file:///data/bbox/tmp")[Stage 0:>                                                          (0 + 2) / 2]18/06/05 12:37:39 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, dec-bb-dl03.bbox-dec.lab.oxv.fr, executor 1): java.io.IOException: Mkdirs failed to create file:/data/bbox/tmp/_temporary/0/_temporary/attempt_201806051237_0000_m_000000_0 (exists=false, cwd=file:/yarn/nm/usercache/hdfs/appcache/application_1527756804026_0065/container_e33_1527756804026_0065_01_000002)        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447)        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:926)        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)        at parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:225)        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)        at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetRelation.scala:94)        at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anon$3.newInstance(ParquetRelation.scala:286)        at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:129)        at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:255)        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)        at org.apache.spark.scheduler.Task.run(Task.scala:89)        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)        at java.lang.Thread.run(Thread.java:748)18/06/05 12:37:39 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2, dec-bb-dl03.bbox-dec.lab.oxv.fr, executor 1): java.io.IOException: Mkdirs failed to create file:/data/bbox/tmp/_temporary/0/_temporary/attempt_201806051237_0000_m_000000_1 (exists=false, cwd=file:/yarn/nm/usercache/hdfs/appcache/application_1527756804026_0065/container_e33_1527756804026_0065_01_000002)        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447)        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:926)        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)        at parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:225)        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)        at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetRelation.scala:94)        at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anon$3.newInstance(ParquetRelation.scala:286)        at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:129)        at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:255)        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)        at org.apache.spark.scheduler.Task.run(Task.scala:89)        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)        at java.lang.Thread.run(Thread.java:748)

 

 

We use CDH 5.14 with the Spark included into the CDH (1.6.0), we think about an version incompatibility issue.

 

First I tried to change directory rights (777 or give write right to hadoop group), but it didn't work.

 

Any idea ?

 

Julien.

 

 

 

 



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3